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Abstract. The notion of building blocks can be related to the structure of the offspring probability distribution: 
loci of which variability is strongly correlated constitute a building block. We call this correlated exploration. 
With this background we analyze the structure of the offspring probability distribution, or exploration distribution, 
for a GA with mutation only, a crossover GA, and an Estimation-Of-Distribution Algorithm (EDA). The results 
allow a precise characterization of the structure of the crossover exploration distribution. Essentially, the crossover 
operator destroys mutual information between loci by transforming it into entropy; it does the inverse of correlated 
exploration. In contrast, the objective of EDAs is to model the mutual information between loci in the fitness 
distribution and thereby they induce correlated exploration. 



1 Introduction 

In the realm of evolutionary computation the notion of building blocks of evolution has been developed in 
Holland's original works (Holland 1975; Holland 2000) to describe the effect of crossover. In that respect, 
building blocks are composed of genes with more or less linkage between them. This is one to one with the 
notion of schemata and eventually lead to the schema theories which describe the evolution of these building 
blocks. 

In the biology literature though, the notion of building blocks has quite a different connotation. As a paradigm 
I choose the empirical findings of Haider, Callaerts, & Gehring (1995): The experimenters forced the mutation 
of a single gene, called "eyeless gene" , in early ontogenesis of a Drosophila Melanogaster fly. This rather subtle 
genotypic variation results in a severe phenotypic variation: an additional whole, functionally complete eye 
module grows at some place it was not supposed to. Here, the notion of a building block refers to the eye as 
a functional module which can be grown phenotypically by triggering a single gene. In other words, a single 
(and thus non-correlated) mutation of a gene leads to a highly complex, in terms of physiological cell variables 
highly correlated phenotypic variation. Such properties of the genotype-phenotype mapping are considered as 
the basis of complex adaptation (Wagner & Altenberg 1996). A theory on the evolution of complex phenotypic 
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variability exists (Toussaint 2003b) , and in this paper we show that the induced notion of building blocks is 
completely different to the one induced by crossover. 

Besides the discussion of crossover in GAs and that of functional modularity in natural evolution, there is a 
third field of research that relates to the discussion of building blocks: Estimation-of-Distribution Algorithms 
(EDAs, Pelikan, Goldberg, & Lobo 1999). These algorithms arc a direct implementation of the idea of correlated 
exploration in the framework of heuristic search algorithms. They explicitly encode the search distribution 
(i.e., offspring probability distribution) by means of a product of marginals (PBIL, Baluja 1994), factorized 
distributions (FDA, Miihlenbein, Mahnig, & Rodriguez 1999), dependency trees (Baluja & Davies 1997), or, 
most generally, a Bayesian network (BOA, Pelikan, Goldberg, & Cantii-Paz 2000). To my point of view, the 
key of these algorithms is that they are capable to induce the same notion of building blocks as we introduced 
it in the context of natural evolution. For instance, consider a dependency tree where the leaves encode the 
phenotypic variables. Offsprings are generated by sampling this probabilistic model, i.e., by first sampling the 
root variable of the tree, then, according to the dependencies encoded on the links, sampling the root's successor 
nodes, etc. Now, if we assume that the dependencies are very strong, say, deterministic, it follows that a single 
variation at the root leads to a completely correlated variation of all leaves. Hence, we may define a set of 
leaves which, due to their dependencies, always vary in high correlation as a functional phenotypic module in 
the same sense as for the eyeless paradigm. 

Several discussions in the EC community though contradict this point of view: Some argue that the essence 
of EDAs is that they can model the evolution of crossover building blocks (schemata) by explicitly encoding the 
linkage correlations that arc implicit in the offspring distribution of crossover GAs (Shapiro 2003, Introduction). 
In that sense, EDAs are "only" faster versions of crossover GAs; faster because EDAs actively analyze corre- 
lations in the selection distribution whereas crossover masks would have to self-adapt (see section 6). In this 
paper we want to point out that, certainly, crossover induces a correlation in the search distribution that can 
be modeled by graphical models, but the concept of graphical models is far more general than that of linkage 
correlations. Hence, EDAs and non-trivial gene interaction models (non-trivial genotype-phenotype mappings, 
Toussaint 2003b) can introduce correlational structures in the search distribution that go qualitatively beyond 
simple crossover GAs. 

Most important of all: EDAs and gene interaction models can account for correlated innovation. Here, 
innovation means that some phenotypic variable changes its value and some other phenotypic variables change 
their values in high dependence of this change, such that the constellation of this set of variables is really new, 
has not been present in the parent population. In contrast, crossover can only preserve certain (by the crossover 
mask determined) linkage correlations that have been present in the parent population and never explores new 
correlated constellations in the sense of correlated innovation. 

The main goal of this paper is to prove and formalize the claims that have been made above. After we 
define crossover in the next section, section 3 and 4 will present some theorems on the 'structure' of the search 
distribution after mutation and crossover. With structure we mean the correlational structure that we measure 
by means of mutual information. Many arguments are based on the increase and decrease of mutual information 
in relation to increase or decrease of entropy in the search distribution. Section 5 finally defines the notion of 
correlated exploration and thereby pinpoints the difference between linkage correlations and correlations in 
EDAs or gene interaction models. Figure 2 already explains the key idea. 

2 Formalism 

The Simple GA. We represent a population as a distribution p over genotype space fl. In this paper we 
assume that a genotype is composed of a fixed number of genes, O = O-^ x ■ • ■ x ft^ , where the space fi* of 
alleles of the ith gene is arbitrary. We represent also finite populations as a distribution p € A^, namely, if the 
population is given as a multiset A = {xi, .., x^} we (isomorphically) represent it as the finite distribution given 
by p = i EiLi ^x, where ^ is the delta distribution at x, i.e., p{x) = ^gf^ = multiplicity of x in a _ Crossover 
and mutation are represented as operators A^ A^ that map a parental (finite or infinite) population to an 
offspring distribution. Given some operator U : A^^ ^ A^ we will use the notation Ay-B = B{Up) — B{p) to 
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denote the difference of a quantity B : — > M under transition, e.g., the quantity may be the entropy H{p) 
of a distribution. 

In that framework wo may write the evolution equation of a crossover GA as 
^,(4+1) = §M5-(t) §^Mep(*) , 

with crossover C, mutation M, offspring sampHng S'^, fitness J', and parent sampling A sampling operator 
S" : A^ — > A^ draws n independent samples from a distribution and maps this multiset of samples to the 
respective finite distribution; note that lim„_>oo §" = id. Fitness 3^^*' : A^ — > A^ rescales a distribution 
proportional to some functional /(*), (3'(*)p)(a;) = ^ f/to/jjSx') - ^® ^^^^'^ mutation and crossover more 
precisely as follows: 

Definition 2.1 (Mutation). We define mutation as an operator M : A^ — > A'^ defined by the conditional 

probability M(y|a;) of mutating from x G f2 to y G f2: 

X 

A typical mutation operator fulfills the constraints of symmetry and component- wise independence: 

a) M.{y\x) = M(x|2/) 

N 

b) $7 = 0^ X ••• xO^ ^ M.{x\y) = ^W[x'\y') 

In the following we will refer to the the simple mutation operator for which all component-wise mutation 
operators are such that the probability of mutating from a; to t/ is constant for x ^ y: 

Vi:M'=M*, yxj^=yGn*:M*(x\y) = -, Vx g O* : M*(x\x) = i _ " ~ ^) 

n n 

where n= and < a < 1 denotes the mutation rate parameter. 

Definition 2.2 (Crossover). We define crossover as an operator A^ — > A^ parameterized by a mask distri- 
bution c S A^*^'^^ , where N is the number of loci (or genes) of a genome in fl: 

C: A^^A" :p^ep= ^ e{-\xo,xi)pixo)p{xi) , 

e(a;|a;o, xi) = ^ c(m) [a; = xq ®m xi] , 

nie{0,l}" 

where the ith allele of the m-crossovcr-product xq (E)m x\ is the ith allele of the parent Xmi, i-e., (xo '^m x\f = 
[xmiY- We only consider symmetric crossover, where c(to) = c(jh). 

In the case of bit strings, O = {0, 1}^, it holds xq 0^ xi = xq ^ m (B m ^ xi, where © denotes the XOR and 
the AND . It follows that (Vose 1999, Theorem 4.4) 

VxcXiGQ: e(-|a;o,a;i) = e(-|xi,a;o) , Q{x\xo,xi) = Q{0\xo ® x,xi ® x) . 
Estimation-Of-Distribution Algorithms. Concerning EDAs, we write their dynamics as 

where, instead of a parent population, some other parameters y^*^ (e.g. a Baycsian graph or dependency tree) 
determine the offspring distribution ^y'^*\ which is sampled, evaluated, and, instead of a simple parent sampling. 
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mapped back on new parameters !/('+^) by some update operator 'K. The operator 'K is called heuristic rule 
and, in the case of Estimation-of-Distribution Algorithms, is such that the new search distribution 
approximates the experienced fitness distribution J'S^ ^y^*h The generic implementation of this idea is 

yit+i) ^y* ^ y* ^ ar gnnii £) ( J S ^ || $y) , 

where Y is the space of feasible parameters y and Z)(- || ■) denotes the KuUback-Leibler distance (see Toussaint 
(2003a) for a discussion of generic heuristic search and evolution). In fact, the BOA algorithm (Pelikan, 
Goldberg, & Cantii-Paz 2000), which uses Bayesian networks to parameterize the search distribution, realizes 
exactly this scheme. Other algorithms (Baluja & Davies 1997; Miihlenbein, Mahnig, & Rodriguez 1999; Baluja 
1994) differ in some details, e.g., they use distance measures other than the KuUback-Leibler divergence or realize 
a gradual adaptation of continuous parameters y of the style "?/(*+^) = ay* + {l — a) . See (Toussaint 2003b) 
for a survey on the relation between EDAs and the evolution of genetic representations {a -evolution) in the 
context of non-trivial genotype-phenotype mappings. 

3 The structure of the mutation distribution 

This section derives a theorem that simply states that mutation increases entropy and decreases mutual infor- 
mation. It is surprising how non-trivial it is to prove this intuitively trivial statement. 

Lemma 3.1 (Component-wise mutation). Consider the component-wise simple mutation operator M* as 
given in definition 2.1. It follows that 



a) 



M*p(a;) = (1 - a)p{x) - a - , 

n 



which is a linear mixture between p and the uniform distribution ("^") with mixture parameter a. 
b) For every non-uniform population p, the entropy ofM.*p is greater than the entropy of p, 
H{M*p) > H{p) . 

Proof, a) 

a (n — 1) \ a' 



p{x) = ^M*(^|y)p(y) = Y.-p{y) + f(l - ^ '-) - -] p{x) = - + (1 - a)p{x) 

y 



b) We generally show that the entropy increases if you mix a distribution with the uniform distribution. 
We prove this by considering the first two derivatives of the entropy functional with respect to the mixture 

parameter a. Let 

q{x) = (1 - a)p{x) + - , 
n 

and recall H{q) = - Ex li^) and {X InX)' = X'{{lnX) -\- 1). It follows 

^H{q) = -J2[- P(^) + (In 9(a;) + l) = J2 ^(^) " ^] ' 



da 

d_ 
da 



In - = 
n 



-^r—^Hiq) = — > — < if p IS non-uniform. 

oa^ ^ q{xj 
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What wc found is that (1.) the entropy is maximal for the extreme case a = 1 since its derivative w.r.t. a at this 
point vanishes (of course, this corresponds to the trivial case where q becomes the uniform distribution) and 
(2.) the second derivative is always negative if p is non-uniform. Hence, the plot of H versus a is comparable 
to an upside-down parabola with maximum at a = 1. It follows that for all a < 1 (to the left of the maximum) 
the derivative -^H{q) is positive. Entropy continuously increases with a. And hence, for every < a < 1 and 
every non-uniform population p, H{M*p) > H{p). □ 

Theorem 3.2. Consider the simple mutation operator M(a;|y) = ni-^*(^N2/*) o,^ given in definition 2.1. If 
p G is non-uniform it follows that entropy increases, H{Mp) > H{p), and mutual information decreases, 
I{Mp) < I{p). 



Proof. We first prove that the cross entropy decreases. Assuming only two genes, the compond mutation 
distributions reads 

"Mpix, y) = [l- of p(x, y) + (1- a)ap(x) - -|- (1 - a) a - p(y) + 0^ -- 

n n n n 



= (1 — a) [1 — a) p{x,y) + a — p{x) +a- 
L n 

= {i-a)q{x,y)+a-q{y) , 
n 

where q{x, y) = {1 — a) p{x, y) + ap{x) 



1 



(1 - a)p{y) + a 



1 



n 



q{x) = p{x) , q{y) = (1 - a)p{y) + 



We call q a one-component a-mixture since only in one component the uniform distribution was mixed to p. 
This shows that the compound distribution for two genes is a one-component a-mixture of a distribution q, 
which is itself a one-component a-mixture. For compound distributions with more than two genes this will be 
recursively the case and generally the mutation operator can be expresses as concatenation of one-component 
a-mixtures. Hence, it suffices when we prove that the mutual information decreases for one such step of one- 
component a-mixing. 

We use the same technique of calculating derivatives with respect to the mixture parameter to proof decreas- 
ing cross entropy. To simplify the notation we use the abbreviations: 



A = q{x,y), A 



B = q{x)q{y)=p{x)\{l-a)p{y) + - , B 



a=l 



a=l 



ap{x) 



A' 



9 . /X P{X) 

—A = -p{x,y) + ^ 
da n 

1, 



B'=p{x) {-p{y) + -) 



A" ■ 
B" 



0, 

: . 



With these abbreviations (keeping the dependencies on x, y, and a in mind) we can write: 



^ B 



AB' 



B 



x,y la=l x,y 

-A' AB' 

.'b 



x,y 

E 



A'^ 



A 
V A 



- 2- 



B^ 
A' B' 







A'B' 
B 



p(x) 
n 

A(B'f 
52 



A 


a = l 


A 


a=l 



p{x) {-p{y) + -) = 



AjB 'f 

~B2 



B 



{BA' -AB'f 



AB^ 



> 



So, what we foimd is that (1.) for a = 1 the cross entropy is minimal since its derivative w.r.t. a at this point 
vanishes (of course, this corresponds to the trivial case where q{x, y) = p{x) i) and (2.) for all other points the 
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second derivative is positive. The plot of / versus a is comparable to an upwards parabola with minimum at 
a = 1. It follows that for a < 1 (to the left of the minimum) the derivative -§^I{q) is negative and thus the 
cross entropy continuously decreases with increasing a. 

Concerning increasing entropy, it is obvious that the marginals of the mutation distribution are simply 

For the component-wise mutation operators we proved that entropy increases (for non-zero a and non-uniform 
p) and thus A^vt-ff' > 0. Consequently, 

i 

□ 

4 The structure of the crossover distribution 

What is the structure of the crossover search distribution Qp, given p S and c G A^^'^} ? The first theorem 
can directly be derivc^d from our definition of the crossover operator. It captures the most basic properties of 
the crossover operator with respect to the correlations it destroys in the search distribution: 

Theorem 4.1. Let H{p), p*, H^{p) = H{p^), and I{p) = H^{p) — H{p) denote the entropy, the ith marginal 
distribution, the marginal entropies, and the mutual information of a distribution p . For any crossover operator 
C and any population p it holds 

a) Mi : (Cp)' =p^, Ae-ff' = 0, i.e., the marginals and hence their entropies do not change, 

b) AqI = —AeH < 0, i.e., the increase of entropy is equal to the decrease of mutual information. 

Proof. Let us first calculate the marginals after crossover. Let a be an allele of the ith gene. 

{GpYia) = '^cim) [a = {xmi Tjpixo) p{xi ) , 



= X] X] c{'m)[a= {xoy]pixo)p{xi)+ 'Y ^ c{m)[a = {xiy]p{xo)pixi) , 

xq,xi m:mi—0 xq,xi m:mi=l 

= p\a)[ c(m)]+pX«)[ Yl c(m)]=pXa). 

m:mi=0 m:mi = l 

Since the marginals are not changed by crossover, the marginal entropies do not change either. Statement b) 
follows from the definition of the mutual information: 

AeH + Ael = H{Qp) - H{p) + I{Qp) - I{p) 

= H{ep) - H{p) + YH\ep) - H{ep) -[Yh\p)- h{p) 

i i 

= Yii\tp)-Y^\p) = Q. 

i i 

□ 

The following theorem makes this more concrete when focusing on two specific genes of a genome of arbitrary 
length. We calculate the mutual information between these two genes in the search distribution Gp — which 
is a measure for the linkage between them. Let it be the ith and jth gene. We use a and b as alleles; 
p'^{a,b) = X^icenl^* ~ I^'' ~ ^P{^) denotes the probability that the ith gene has allele a and the jth gene 
allele b. Analogously, let c*^ be the marginal of the crossover mask distribution with respect to the two genes, 
i.e., c*^(01) = E™e{o,i}«K = 0] K" = 1] c(m). 
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I{Qpyi = J2 {^d^mp'Ha, b) + {01) p\a)iP{b)) In (20'^' (00) ^j";! + 2c'^ (01)) , 



Theorem 4.2. For any crossover operator C and any population p it holds: 

a) The compound distribution of two genes after crossover is given by 

{epy^{a,b) = 2 c'^ (00) p'^ {a, b) + 2 c'^ (01) p' (a) p^{b) . 

i.e., a linear combination of the original compound distribution p^^{a,b) and the decorrelated product 
distribution (a) p^ (6) . 

b) The mutual information I(Cpy^ in the compound distribution of two specific genes is 

p'Ha,b) 

1 " \ uu) p - ^a, i- " \yi ) p yujp- (U) j 111 1 " {W) 

a,b 

c) and we have 

< 2c*^ (00) (/(p)*^ +ln(2c*^ (00))) < I{tpy^ < i{py^ . 

The two left < are exact for complete crossover, c'^(OO) = 0, c*^(01) = |, the right < is exact for no 
crossover, c'^(OO) = i, c*^(01) = 0. 

Proof, a) 

ep*^{a,b) = XI ^(™) [(xmof = a] [{xm,y = b]p{xo)p{xi) 

= X (c*^(00) [(xo)° = a][{xoy =b]+ c^^"(01) [(xo)° = a][{x^y = b]+ 

Xo,Xl 

c»^(10) [{x^y = a][{xoy =b]+ d\n) [(xi)" = a][{x^y = 6]) p(xo)p(xi) 
= 2Xc^^(00) [{xoy = a][(xo)i = b]p{xo) + 2 ^ c^^"(01) [(xo)° = a][{x,y = b\p{x^)p{x^) 

= 2 (00) p'^ (a, 6) + 2 (01) p\a) p> (6) . 

b&c) 

I{epy^ = H{Qf) + H{Qp^) - H{Qp) = H{p') + H{p') - H{Qp) 
< H{p') + H{p>) - Hip) = I{py^ 

HiGp) = - X! (2 (00) P'^ («. ^) + 2 c*^ (01) P'{a) p' (b)) In (2 (00) (a, 6) + 2 c*^ (01) p'{a) p' (&)) 



a, 6 



p'Ha,b) 



-Y^(2d^mf^ia,b) + 2d^mp\a)piib)) [in (2c^^(00)^^^^.^^^ 
- X (2 c'^ (00) p'^' (a, 6) + 2 c'^ (01) (a) p^' (&) 



2c^^(01) -liip\a) -Inp'ib) 



In ( 2c'J(00) j''^^";.^^^^ + 2c'^(01))] + H{p') + H{p>) 



P'{a)p3{b) 
f'{a,b) 



I{epy^ = (2c'imp'i{a, b) + 2 c'i (01) p\a)p^{b)) In (2c^^(00)^^p^ + 2c'^(01)) 



> Y (2c'^ (00)p'^ (a,6)) In (20^^' (00)^^^^^) = 20^^' (00) (/(p)^^' + ln(2c'^ (00))) 

a.b P \ ) 



□ 
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..-^^ ^o.-'*'- ,oA-°' 

^c^^^ c-'" v^-^^ 



Q jz-jj-i mutual information in the 

^ ' scarcli distribution 

Figure 1: The degree of mutual information carried over to the exploration distribution. I{p) is the mutual 
information contained in the parent population. Gene pool GAs, crossover, and mutation destroy correla- 
tions. ED As and evolution in the case complex genotype-phenotype mappings (see Toussaint 2003) may induce 
correlated exploration. 

Let us summarize what we actually found in the above theorems: 

• The marginal distributions do not change at all. There is no exploration w.r.t. the alleles of single genes. 

• The more entropy crossover introduces in a population, the more the mutual dependencies between genes 
are destroyed. Actually, crossover destroys mutual information in the parent population by transforming 
it into entropy in the crossed population. In particular, if there is no mutual information in the parent 
population, crossover will not generate any more entropy. That's linkage equilibrium. 

• The last theorem shows how the crossover mask distribution c determines which correlations in the parent 
population are destroyed and transformed intro entropy. 

The purpose of these theorems is to propose a probably nonstandard point of view on what crossover actually 

docs: It seems misleading to say that crossover introduces the notion of building blocks. Actually, a non- 
crossover GA comprises the strongest and most natural building blocks; individuals as such are the building 
blocks that carry the mutual information between their genes. Crossover is a means to break these maximal 
building blocks apart into smaller pieces by converting mutual dependencies into entropy. As a result it induces 
smaller, more fine-grained building blocks with, in total, less mutual information in the crossed population. It 
is thus questionable to state that the correlational structure in the crossed population is more complex with 
crossover — actiially it is simpler since it carries less information. In the limit of linkage equilibrium (or uniform 
c), all correlations have been destroyed and the crossed population becomes a product distribution. 



5 Correlated exploration 

What crossover and ED As share is that both introduce a non-trivial correlational structure in the search distri- 
bution. The crucial difference is that Estimation-of-Distribution Algorithms try to "carry over" the correlations 
in the population of selected to the search distribution whereas crossover destroys correlations. Carrying over 
correlations is non-trivial if the search distribution is to be explorative, i.e., of more entropy: Typically mutation 
operators add entropy to the distribution by adding independent noise to each marginal, but this reduces the 
mutual information between genes (see Lemma 3.2). 

Consider illustration 2. In a finite population of 3 individuals, marked by crosses, the values at the 1st 

and 2nd loci arc correlated (here illustrated by plotting them on the bisecting line). The crossed population 
Gp comprises at most 9 different individuals; in the special cases c*'' (01) = and c*'' (01) = ^ the population 
is even finite and comprises 3 respectively 9 equally weighted individuals marked by circles. Mutation adds 
independent noise, illustrated by the gray shading, to the alleles of each individual. The two illustrations for 
the GA demonstrate that crossover destroys correlations between the alleles in the initial population instead 
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GA, coi = 



GA, coi 


_ 1 

2 


OO 





o © 


O 







1. locus 

X individuals in the finite population p 
O individuals in the crossed population Cp 
exploration distribution MCp 



EDA 




correlated exploration 



Figure 2: Illustration of the type of correlations in GAs with and without crossover in comparison to correlated 

exploration in EDAs. The gray shades indicate the exploration distributions, say, regions of probability greater 
than some constant. The degree to which the gray shading is aligned with the bisecting line indicates correlat- 
edness. The crossover GA in the middle destroys correlations whereas EDAs may induce high correlations. 



of carrying it over to the search distribution: the gray shading is not focused on the bisecting line. Instead, 
an EDA would first estimate the distribution of the individuals in p. Depending on what probabilistic model 
is used, this model can capture the correlations between the alleles; in the illustration the model could be a 
Gaussian parameterized by the mean and covariance matrix (just as for the CMA evolution strategy (Hansen & 
Ostermeier 2001)) and the estimation of the covariance in p leads to the highly structured search distribution in 
which the entropy of each marginal is increased without destroying the correlations between them. We capture 
this difference in the following definition: 

Definition 5.1 (Correlated exploration). Let XL : be an operator. The following conditions need 

to hold for almost all p fl which means: for all the space ^l except for a subspace of measure zero. We define 

• IX is explorative <^=^ Ayi? > for almost all p e O, 

• U is marginally explorative <^=^ U is explorative and 3i : Ayif' > for almost all p gQ, 

• IX is correlated explorative <S=^ U is explorative and Ay/ > 0, or equivalently < AuH < Aui?', 

for almost all p E CI. 

Corollary 5.1. From this definition it follows that 

a) If and only if there exist two loci i and j such that the marginal crossover mask distribution c*-'(01) for 
these two loci is non-vanishing, c*'' (01) = c*-^(10) > 0, then crossover 6 is explorative. For every mask 
distribution c G A'f"'^^ , crossover C is neither marginally nor correlated explorative. 

b) Mutation M is marginally but not correlated explorative. 

c) Mutation and Crossover M o 6 are marginally but not correlated, explorative. 

d) In the case of a non-trivial genotype-phenotype mapping mutation as well as crossover can be phenotypically 
correlated explorative. 

Proof, a) That C is neither marginally nor correlated explorative follows directly from Theorem 4.1a, which 
says that for every c € A^O'i}" and any population p £ the marginals of the population do not change under 
crossover, AqW = 0. But under which conditions is 6 explorative? 

If, for two loci i and j, c'^(Ol) is non- vanishing, it follows that C reduces the mutual information between 
these two loci (Theorem 4.2c). The subspace of populations p that do not have any mutual information 
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between these two loci is of measure zero. Hence, for almost all p, AqP^ < and, following Theorem 4.1b this 
automatically leads to an increase of entropy AqW^ > in the compound distribution of the two loci and, 
since AqH > AqW^ , also of the total entropy. 

The other way around, if, for every two loci i and j, c'''(01) vanishes it follows that there is no crossover, i.e., 
on the all-Os and all-Is crossover masks have non-vanishing probability. Hence, C = id and is not explorative. 

b) In lemma 3.2 we prove that for every non-uniform population p Aj^H > 0, Ay^W > 0, and A^I < 0. 

c) Since both mutation and crossover are not correlative, it follows that their composition is also not correl- 
ative: 

Ael <0 , AmI < AmqI < . 

d) What is different in the case of a non-trivial genotype-phenotype mapping? The assumptions we made 
about the mutation operator (component-wise independence) hold only on the genotype space, not anymore on 
the phenotype space: On genotype space mutation kernels are product distributions and mutative exploration 
is marginally explorative but not c;orrc;lated; projected on phenotype space, the mutation kernels are in general 
not anymore product distributions and hence phenotypic mutative exploration can correlated. □ 

6 Conclusion 

There are three main points to conclude: 

• First, we point out that crossover does the inverse of correlated exploration. It destroys correlations 
in the exploration distribution by transforming them into entropy. In an information theoretic sense, 
the exploration distribution after crossover is less complex (carrying less mutual information) that before 
crossover. To me it seems just "countersensible" to base the notion of building blocks on a discussion 
of crossover. The most natural building blocks are individuals carrying the mutual information between 
genes within the exploration distributions. Crossover is splitting these building blocks in smaller ones. 

• Of course, the crossover exploration distribution can be modeled by graphical models since graphical 
models can model any distribution. In that respect, one could certainly design search algorithms based on 
probabilistic models of the search distribution (instead of a population) that model crossover GAs — the 
PBIL is a candidate. However, I would challenge to call such an algorithm an Estimation-of-Distribution 
algorithm because its objective is not to really estimate the distribution of selected and in particular the 
correlations within this distribution (with the exception of PBIL who's objective is to only estimate the 
marginals which coincides with modeling crossover). In general, EDAs go beyond modeling crossover since 
they introduce a quality which is not a quality of crossover: correlated exploration. 

• Finally, there is a crucial difference between EDAs and (crossover) GAs with respect to the self-adaptation 
of the exploration distribution. EDAs always adapt their search distribution (including correlations) 
according to the distribution of previously selected solutions. In contrast, the crossover mask distribution, 
that determines where correlations are destroyed or preserved, is usually not self-adaptive. However, if 
considering a phenotypic level (i.e., if we consider an indirect encoding) then both, mutational exploration 
and crossover exploration can be correlated and self-adaptive on the phenotypic level (see the theory on 
cr-evolution in (Toussaint 2003b), realizing that both, the definition of mutation and the definition of 
crossover do not commute with phenotypic equivalence). 

References 

Baluja, S. (1994). Population-based incremental learning: A method for integrating genetic search based function op- 
timization and competitive learning. Technical Report CMU-CS-94-163, Computer Science Department, Carnegie 
Mellon University. 



10 



REFERENCES 



Marc Toussaint — February 7, 2008 



Baluja, S. & S. Davics (1997). Using optimal dcpondoncy-trccs for combinatorial optimization: Learning the structure 
of the search space. In Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), 
pp. 30-38. 

Haider, G., P. Callacrts, & W. Gchring (1995). Induction of ectopic eyes by targeted expression of the eyeless gene in 

Drosophila. Science 267, 1788-1792. 
Hansen, N. & A. Ostermeier (2001). Completely derandomized self-adaption in evolutionary strategies. Evolutionary 

Computation 9, 159-195. 

Holland, J. (1975). Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, USA. 
Holland, J. H. (2000). Building blocks, cohort genetic algorithms, and hyperplane-defined functions. Evolutionary 
Computation 8, 373-391. 

Miihlenbein, H., T. Mahnig, & A. O. Rodriguez (1999). Schemata, distributions and graphical models in evolutionary 

optimization. Journal of Heuristics 5, 215-247. 
Pelikan, M., D. E. Goldberg, & E. Cantu-Paz (2000). Linkage problem, distribution estimation, and bayesian networks. 

Evolutionary Computation 9, 311-340. 
Pelikan, M., D. E. Goldberg, & F. Lobo (1999). A survey of optimization by building and using probabilistic models. 

Technical Report IlIiGAL-99018, Illinois Genetic Algorithms Laboratory. 
Shapiro, J. L. (2003). The sensitivity of pbil to the learning rate, and how detained balance can remove it. In C. Cotta, 

K. De Jong, R. Poli, & J. Rowe (Eds.), Foundations of Genetic Algorithms 7 (FOGA VII). Morgan Kaufmann. 

To appear in Spring 2003. 

Toussaint, M. (2003a). Modeling the evolution of phenotypic variability. Submitted to Journal of Theoretical Biology. 
Toussaint, M. (2003b). On the evolution of phenotypic exploration distributions. In C. Cotta, K. De Jong, R. Poli, & 
J. Rowe (Eds.), Foundations of Genetic Algorithms 7 (FOGA VII). Morgan Kaufmann. To appear in Spring 2003. 
Vose, M. D. (1999). The Simple Genetic Algorithm. MIT Press, Cambridge. 

Wagner, G. P. & L. Altenberg (1996). Complex adaptations and the evolution of evolvability. Evolution 50, 967-976. 



11 



